feat(fetch): use HTTP content negotiation to request native markdown#4052
Open
sidney wants to merge 1 commit intomodelcontextprotocol:mainfrom
Open
feat(fetch): use HTTP content negotiation to request native markdown#4052sidney wants to merge 1 commit intomodelcontextprotocol:mainfrom
sidney wants to merge 1 commit intomodelcontextprotocol:mainfrom
Conversation
Send Accept: text/markdown, text/html;q=0.9, */*;q=0.8 with each fetch request. When the server responds with Content-Type: text/markdown (with or without a charset parameter), return the body as-is, skipping the readability + markdownify extraction. Otherwise the existing pipeline runs unchanged. Servers that don't perform content negotiation ignore the Accept header and respond as they always would, so this is fully backwards-compatible.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Send
Accept: text/markdown, text/html;q=0.9, */*;q=0.8with each fetch request and pass the body through unchanged when the server responds withContent-Type: text/markdown.Description
Adds detection of servers that implement
text/markdown, fast-track getting the Markdown content they send natively.Fully backwards compatible. No change to processing from servers that don't send Markdown.
fetch_urlnow advertises a Markdown preference via theAcceptheader. When the server respond withContent-Type: text/markdown(with or without a charset parameter), the body is returned as-is, skipping the readability + markdownify pipeline. Anything else falls through to the existing extraction unchanged.Strict media-type matching: only
text/markdowntriggers the short-circuit.text/markdown; charset=utf-8qualifies; non-standard variants liketext/x-markdowndo not. There's a test pinning this contract.Server Details
fetchtool's underlying request and response handling)Motivation and Context
Some servers can deliver native Markdown directly — Cloudflare zones with Markdown for Agents enabled, content-negotiating CMSes, raw-content endpoints, and so on. Today
mcp-server-fetchignores that and runs every response through readability + markdownify, which is lossy and unnecessary when the server is already offering Markdown.Cloudflare's own measurements report ~80% token reduction compared to the equivalent HTML — their docs blog post is 16,180 tokens as HTML vs 3,150 tokens as Markdown. Even outside the Cloudflare case, any site that already speaks
text/markdownbenefits from the body being passed through unmodified rather than round-tripped through HTML extraction.Acceptis a hint: servers that don't perform content negotiation simply respond with whatever they always would, so the change is fully backwards-compatible.How Has This Been Tested?
Unit tests — existing
TestFetchUrltests pass unchanged. Four new tests appended:test_fetch_markdown_returns_early—text/markdownbody returned as-is, no extraction, empty prefix.test_fetch_markdown_with_charset—text/markdown; charset=utf-8qualifies.test_fetch_x_markdown_does_not_match—text/x-markdownfalls through to the raw-fallback branch (pins the strict-matching contract).test_fetch_sends_accept_header— verifies the Accept header is sent and that markdown is preferred over HTML in the q-list.MCP Inspector —
npx @modelcontextprotocol/inspector uv run mcp-server-fetch. Verified two scenarios:https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/(a Markdown-for-Agents-enabled origin) returns native markdown directly. Same URL without the Accept header serves HTML — easy to confirm withcurl -sI.https://example.com/flows through readability + markdownify and produces the expected structured markdown.LLM client — patched
mcp-server-fetchinstalled in Claude Desktop, exercised against both URLs above. Fast-path returns clean markdown with a YAML frontmatter (Cloudflare's edge-converted output); readability path returns the existing structured extraction.Breaking Changes
None. The change is fully backwards-compatible — servers that don't perform content negotiation ignore the
Acceptheader and respond as they always would. Users do not need to update MCP client configurations.Types of changes
Checklist
(Note on the last two: no new error paths or environment/configuration options are introduced — the fast-path inherits the existing GET call's error handling, and the only added logic is a Content-Type string match.)
Additional context
Cloudflare's Markdown for Agents documentation page is itself served from a Markdown-for-Agents-enabled origin, so it works as a live test target:
Same URL, two different Content-Types depending on the Accept header.
The
raw=Trueflag is unaffected: when a server returnstext/markdown, the fast-path returns the raw body (which is whatraw=Truewould also produce); when a server returns HTML,raw=Truecontinues to short-circuit to the raw HTML branch as before.